skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Andrew Schneider, Lihong He"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Social media is the ultimate challenge for many natural language processing tools. The constant emergence of linguistic constructs challenge even the most sophisticated NLP tools. Predicting word embeddings for out of vocabulary words is one of those challenges. Word embedding models only include terms that occur a sufficient number of times in their training corpora. Word embedding vector models are unable to directly provide any useful information about a word not in their vocabularies. We propose a fast method for predicting vectors for out of vocabulary terms that makes use of the surrounding terms of the unknown term and the hidden context layer of the word2vec model. We propose this method as a strong baseline in the sense that 1) while it does not surpass all state-of-the-art methods, it surpasses several techniques for vector prediction on benchmark tasks, 2) even when it underperforms, the margin is very small retaining competitive performance in downstream tasks, and 3) it is inexpensive to compute, requiring no additional training stage. We also show that our technique can be incorporated into existing methods to achieve a new state-of-the-art on the word vector prediction problem. 
    more » « less